Sample Code in python notebook to use Automatize as a python library.
[download button]
Observations:
You can use configured paths if you want to organize directories
import sys, os
root = os.path.join('automatize', 'assets', 'examples', 'Example')
# We consider this folder organization to the experimental enviromnent:
prg_path = os.path.join(root, 'programs')
data_path = os.path.join(root, 'data')
res_path = os.path.join(root, 'results')
# OR, you can use the .jar method files in:
prg_path = os.path.join('automatize', 'assets', 'method')
To use helpers for data pre-processing, import from package automatize the module preprocessing.py:
from automatize.preprocessing import *
The preprocessing module provides some functions to work data:
Basic functions:
readDataset: load datasets as pandas DataFrame (from .csv, .zip, or .ts)printFeaturesJSON: print a default JSON file descriptor for Movelets methods (version 1 or 2)datasetStatistics: calculates statistics from a datasets dataframe.Train and Test split functions:
trainAndTestSplit: split dataset (pandas DataFrame) in train / test (70/30% by default)kfold_trainAndTestSplit: split dataset (pandas DataFrame) in k-fold train / test (80/20% each fold by default)stratify: extract trajectories from the dataset, creating a subset of the data (to use when smaller datasets are needed)joinTrainAndTest: joins the train and test files into one DataFrame.Type convertion functions:
convertDataset: default format conversions. Reads the dataset files and saves in .csv and .zip formats, also do k-fold split if not presentzip2df: converts .zip files and saves to DataFramezip2csv: converts .zip files and saves to .csv filesdf2zip: converts DataFrame and saves to .zip fileszip2arf: converts .zip and saves to .arf filesany2ts: converts .zip or .csv files and saves to .ts filesxes2csv: reads .xes files and converts to DataFramecols = ['tid','label','lat','lon','day','hour','poi','category','price','rating']
df = joinTrainAndTest(data_path, cols, train_file="train.csv", test_file="test.csv", class_col = 'label')
df.head()
Joining train and test data from... automatize/assets/examples/Example/data Reading train file... Done. Reading test file... Done. Saving joined dataset as: automatize/assets/examples/Example/data/joined.csv Done. --------------------------------------------------------------------------------
| tid | lat_lon | time | price | poi | weather | precip | label | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0.0 6.2 | 345 | -1 | Home | Clear | 10 | Classs_False |
| 1 | 1 | 0.8 6.2 | 717 | 3 | Restaurant | Clouds | 20 | Classs_False |
| 2 | 1 | 3.1 11 | 1032 | 2 | Shopping | Clear | 10 | Classs_False |
| 3 | 1 | 4.3 16.9 | 1179 | 2 | University | Clear | 0 | Classs_False |
| 4 | 1 | 6 13.1 | 1344 | 1 | Restaurant | Clear | 0 | Classs_False |
To k-fold split a dataset into train and test:
k = 3
train, test = kfold_trainAndTestSplit(data_path, k, df, random_num=1, class_col='label')
3-fold train and test split in... automatize/assets/examples/Example/data
Spliting Data: 0%| | 0/2 [00:00<?, ?it/s]
Done. Writing files ... 1/3
Writing TRAIN - ZIP|1: 0%| | 0/6 [00:00<?, ?it/s]
Writing TEST - ZIP|1: 0%| | 0/4 [00:00<?, ?it/s]
Writing TRAIN / TEST - CSV|1
Writing TRAIN - MAT|1: 0%| | 0/6 [00:00<?, ?it/s]
Writing TEST - MAT|1: 0%| | 0/4 [00:00<?, ?it/s]
Writing files ... 2/3
Writing TRAIN - ZIP|2: 0%| | 0/6 [00:00<?, ?it/s]
Writing TEST - ZIP|2: 0%| | 0/4 [00:00<?, ?it/s]
Writing TRAIN / TEST - CSV|2
Writing TRAIN - MAT|2: 0%| | 0/6 [00:00<?, ?it/s]
Writing TEST - MAT|2: 0%| | 0/4 [00:00<?, ?it/s]
Writing files ... 3/3
Writing TRAIN - ZIP|3: 0%| | 0/8 [00:00<?, ?it/s]
Writing TEST - ZIP|3: 0%| | 0/2 [00:00<?, ?it/s]
Writing TRAIN / TEST - CSV|3
Writing TRAIN - MAT|3: 0%| | 0/8 [00:00<?, ?it/s]
Writing TEST - MAT|3: 0%| | 0/2 [00:00<?, ?it/s]
Done. --------------------------------------------------------------------------------
To run feature extraction methods, import from package automatize the modules run.py or script.py:
from automatize.script import *
The gensh function is the statring point to generate scripts for the available methods:
method: method name to generate the scripts;datasets: dictionary for datasets config, withparams: dictionary of configuration parameters for scripting (described later)method = 'hiper'
datasets = {'Animals.RawTraj': ['specific']}
params = {
'sh_folder': 'scripts', # where to generate script files
'folder': 'EXP2022', # folder prefix for result files
'k': 5, # number of folds - optional
'root': root, # root folder of the experimental environment
'threads': 10, # number of threads allowed (for movelets methods) - optional
'gig': 100, # GB of RAM memory limit allowed (for movelets methods) - optional
'pyname': 'python3', # Python command - optional
'runopts': '-TR 0.5', # other arguments to pass to the method line (-TR is the τ for HiPerMovelets) - optional
'timeout': '7d', # set a timeout to methods runtime (7d limits to 7 days)
}
gensh(method, datasets, params)
sh run-H-Animals-specific-10T.sh
'sh run-H-Animals-specific-10T.sh\n'
The available methods in automatize are declared here:
from automatize.helper.script_inc import BASE_METHODS, METHODS_NAMES
for method in BASE_METHODS:
print(method, ''.rjust(25-len(method), ' '), METHODS_NAMES[method])
MARC MARC npoi NPOI-F MM MASTERMovelets MM+Log MASTERMovelets-Log SM SUPERMovelets SM-2 SUPERMovelets-λ SM+Log SUPERMovelets-Log SM-2+Log SUPERMovelets-Log-λ hiper HiPerMovelets hiper-pivots HiPerMovelets-Pivots hiper+Log HiPerMovelets-Log hiper-pivots+Log HiPerMovelets-Pivots-Log
Alternatively, it is possible to run methods directly from python automatize library
from automatize.run import *
prefix = 'ExempleDS/run1'
Movelets(data_path, res_path, prefix, 'HL-specific', 'Descriptor_hp',
version='hiper', ms=False, Ms=False, prg_path=prg_path, jar_name='HIPERMovelets', n_threads=1)
MARC(data_path, res_path, 'ExempleDS', 'MARC-specific', train='train.csv', test='test.csv',
EMBEDDING_SIZE=100, MERGE_TYPE='concatenate', RNN_CELL='lstm', prg_path='automatize/marc')
sequences = [1,2,3]
features = ['poi']
POIFREQ(data_path, res_path, prefix, '', sequences, features,
method='npoi', doclass=True)
* The subfolder scripts contains auxiliary runnable python files to execute in command line:
# For example, to merge the result files (need for Movelets methods)
!python3 "automatize/scripts/MergeDatasets.py" $res_path/$prefix/HL-specific
To run classifiers for the HIPERMovelets results, import from package automatize the script analysis.py:
from automatize.analysis import ACC4All, MLP, RF, SVM
a. To run the classifyers for each folder inside a result path prefix:
save_results = True
result_folder = 'HL-specific'
prefix = 'ExempleDS/run1'
ACC4All(os.path.join(res_path, prefix), result_folder, save_results)
HL-specific Done.
b. To run a specific classifyer:
MLP(res_path, prefix, result_folder, save_results)
c. To run the classifyers in shell:
!python3 automatize/scripts/Classifier-MLP_RF.py $res_path/$prefix $result_folder
To read the results, import from package automatize the module results.py:
from automatize.results import *
a. To check the results (both in python or in shell command line):
check_run(res_path, True)
OK: HL ExempleDS 1 specific [100.000][5s]
a. To check the results (both in python or in shell command line)
df = results2df(res_path, prefix, result_folder)
df
Looking for result files in automatize/assets/examples/Example/results/ExempleDS/run1/**/HL-specific/HL-specific.txt
| Dataset | HL-specific | ||
|---|---|---|---|
| 0 | ExempleDS/run1 | Candidates | 1,890 |
| 1 | Scored | 192 | |
| 2 | Recovered | - | |
| 3 | Movelets | 9 | |
| 4 | ACC (MLP) | 100.000 | |
| 5 | ACC (RF) | 100.000 | |
| 6 | ACC (SVM) | 100.000 | |
| 7 | Time (Movelets) | 0.038s | |
| 8 | Time (MLP) | 5s | |
| 9 | Time (RF) | 0.196s | |
| 10 | Time (SVM) | - | |
| 11 | Trajs. Compared | 2 | |
| 12 | Trajs. Pruned | 4 |
To print the dataframe result in a Latex formatted table:
printLatex(df)
\begin{table*}[!ht]
\centering
\resizebox{\columnwidth}{!}{
\begin{tabular}{|c|r||r|}
\hline
Dataset & & HL-specific \\
\hline
\hline
\multirow{13}{2cm}{ExempleDS/run1}
& Candidates & 1,890 \\
& Scored & 192 \\
& Recovered & - \\
& Movelets & 9 \\
& ACC (MLP) & 100.000 \\
& ACC (RF) & 100.000 \\
& ACC (SVM) & 100.000 \\
&Time (Movelets) & 0.038s \\
& Time (MLP) & 5s \\
& Time (RF) & 0.196s \\
& Time (SVM) & - \\
&Trajs. Compared & 2 \\
& Trajs. Pruned & 4 \\
\hline
\end{tabular}}
\caption{Results for ExempleDS/run1 dataset.}
\label{tab:results_ExempleDS/run1}
\end{table*}
To export all results to DataFrame and save:
df = history(res_path)
df.to_csv('experimental_results.csv')
df
| # | timestamp | dataset | subset | subsubset | run | random | method | classifier | accuracy | runtime | cls_runtime | error | file | total_time | name | key | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1.649706e+09 | ExempleDS | specific | specific | 1 | 1 | HL | MLP | 100.0 | 38.0 | 5841.114 | False | automatize/assets/examples/Example/results/Exe... | 5879.114 | HL-specific-MLP | ExempleDS-specific-1 |
To read and visualize the resulting movelets, import from package automatize the module movelets.py:
from automatize.movelets import *
prefix = 'ExempleDS/run1'
movs = read_movelets(os.path.join(res_path, prefix, 'HL-specific'))
movs
[{'lat': '0.8', 'lon': '6.2'},
{'lat': '4.3', 'lon': '16.9'},
{'precip': 2.0},
{'lat': '6', 'lon': '13.1'},
{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'},
{'lat': '0.4', 'lon': '6.7'},
{'lat': '3', 'lon': '13.5'}]
movelets_sankey(movs, attribute='lat') # or movelets_sankey(movs) -> to display all dimensions (may be confusing)
movelets_markov(movs, attribute='lat')
tree = createTree(movs.copy())
from anytree import RenderTree
root_node = convert2anytree(tree)
root_node = RenderTree(root_node)
root_node
Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00")
├── Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00/{'lat': '0.8', 'lon': '6.2'} (85.71%) - 0.00")
├── Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00/{'lat': '4.3', 'lon': '16.9'} (85.71%) - 0.00")
├── Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00/{'precip': 2.0} (85.71%) - 0.00")
├── Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00/{'lat': '0.4', 'lon': '6.7'} (85.71%) - 0.00")
├── Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00/{'lat': '3', 'lon': '13.5'} (85.71%) - 0.00")
└── Node("/{'lat': '5.8', 'lon': '16.5'}=>{'lat': '6.3', 'lon': '13'} (100.00%) - 0.00/{'lat': '6', 'lon': '13.1'} (80.00%) - 0.00")
convert2digraph(tree)
# By Tarlis Portela (2020)